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THE FORMAL DEFINITION OF REFERENCE PRIORS 

By James O. Berger, 1 Jose M. Bernardo 2 and Dongchu Sun 3 

Duke University, Universitat de Valencia and University of 
Missouri- Columbia 

Reference analysis produces objective Bayesian inference, in the 
sense that inferential statements depend only on the assumed model 
and the available data, and the prior distribution used to make an 
inference is least informative in a certain information-theoretic sense. 
Reference priors have been rigorously defined in specific contexts and 
heuristically defined in general, but a rigorous general definition has 
been lacking. We produce a rigorous general definition here and then 
show how an explicit expression for the reference prior can be ob- 
tained under very weak regularity conditions. The explicit expression 
can be used to derive new reference priors both analytically and nu- 
merically. 

1. Introduction and notation. 

1.1. Background and goals. There is a considerable body of conceptual 
and theoretical literature devoted to identifying appropriate procedures for 
the formulation of objective priors; for relevant pointers see Section 5.6 in 
Bernardo and Smith [13], Datta and Mukerjee [20], Bernardo [11], Berger 
[3], Ghosh, Delampady and Samanta [23] and references therein. Refer- 
ence analysis, introduced by Bernardo [10] and further developed by Berger 
and Bernardo [4, 5, 6, 7], and Sun and Berger [42], has been one of the 
most utilized approaches to developing objective priors; see the references 
in Bernardo [11]. 

Reference analysis uses information-theoretical concepts to make precise 
the idea of an objective prior which should be maximally dominated by the 
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data, in the sense of maximizing the missing information (to be precisely 
defined later) about the parameter. The original formulation of reference 
priors in the paper by Bernardo [10] was largely informal. In continuous one 
parameter problems, heuristic arguments were given to justify an explicit 
expression in terms of the expectation under sampling of the logarithm of 
the asymptotic posterior density, which reduced to Jeffreys prior (Jeffreys 
[31, 32]) under asymptotic posterior normality. In multiparameter problems 
it was argued that one should not maximize the joint missing information 
but proceed sequentially, thus avoiding known problems such as marginal- 
ization paradoxes. Berger and Bernardo [7] gave more precise definitions of 
this sequential reference process, but restricted consideration to continuous 
multiparameter problems under asymptotic posterior normality. Clarke and 
Barron [17] established regularity conditions under which joint maximization 
of the missing information leads to Jeffreys multivariate priors. Ghosal and 
Samanta [27] and Ghosal [26] provided explicit results for reference priors 
in some types of nonregular models. 
This paper has three goals. 



Goal 1 . Make precise the definition of the reference prior. This has two 
different aspects. 

• Applying Bayes theorem to improper priors is not obviously justifiable. 
Formalizing when this is legitimate is desirable, and is considered in Sec- 
tion 2. 

• Previous attempts at a general definition of reference priors have had 
heuristic features, especially in situations in which the reference prior is 
improper. Replacing the heuristics with a formal definition is desirable, 
and is done in Section 3. 



Goal 2. Present a simple constructive formula for a reference prior. 
Indeed, for a model described by density p(x \ 8), where x is the complete 
data vector and 9 is a continuous unknown parameter, the formula for the 
reference prior, tt(0), will be shown to be 

«m = lb, aw 



k ^°° fk{0o) 

f k (6) = exp|| p( X W | 6) log[7r*(0 | x«)] dx« 

where 6q is an interior point of the parameter space 0, = {xi, . . . , x^} 
stands for k conditionally independent replications of x, and ir*(0 | x( fc )) 
is the posterior distribution corresponding to some fixed, largely arbitrary 
prior tt*(9). 
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The interesting thing about this expression is that it holds (under mild 
conditions) for any type of continuous parameter model, regardless of the 
asymptotic nature of the posterior. This formula is established in Section 
4.1, and various illustrations of its use are given. 

A second use of the expression is that it allows straightforward compu- 
tation of the reference prior numerically. This is illustrated in Section 4.2 
for a difficult nonregular problem and for a problem for which analytical 
determination of the reference prior seems very difficult. 

Goal 3. To make precise the most common practical rationale for use 
of improper objective priors, which proceeds as follows: 

• In reality, we are always dealing with bounded parameters so that the real 
parameter space should, say, be some compact set Go- 

• It is often only known that the bounds are quite large, in which case it is 
difficult to accurately ascertain which Go to use. 

• This difficulty can be surmounted if we can pass to the unbounded space 
G and show that the analysis on this space would yield essentially the 
same answer as the analysis on any very large compact Go- 

Establishing that the analysis on Q is a good approximation from the refer- 
ence theory viewpoint requires establishing two facts: 

1. The reference prior distribution on G, when restricted to Go, is the ref- 
erence prior on Go- 

2. The reference posterior distribution on G is an appropriate limit of the 
reference posterior distributions on an increasing sequence of compact 
sets {Qi}^2 =1 converging to G. 

Indicating how these two facts can be verified is the third goal of the paper. 

1.2. Notation. Attention here is limited mostly to one parameter prob- 
lems with a continuous parameter, but the ideas are extendable to the mul- 
tiparameter case through the sequential scheme of Berger and Bernardo [7] . 

It is assumed that probability distributions may be described through 
probability density functions, either in respect to Lebesgue measure or count- 
ing measure. No distinction is made between a random quantity and the 
particular values that it may take. Bold italic roman fonts are used for 
observable random vectors (typically data) and italic greek fonts for un- 
observable random quantities (typically parameters); lower case is used for 
variables and upper case calligraphic for their domain sets. Moreover, the 
standard mathematical convention of referring to functions, say / x and g x 
of x G X, respectively by /(x) and <?(x), will be used throughout. Thus, the 
conditional probability density of data x £ X given will be represented 
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by p(x | 9), with p(x | 6) > and f x p(x \ 9) dx = 1, and the reference pos- 
terior distribution of # G given x will be represented by n{9 | x), with 
k(9 I x) > and Jq7t(9 \x)d9 = 1. This admittedly imprecise notation will 
greatly simplify the exposition. If the random vectors are discrete, these 
functions naturally become probability mass functions, and integrals over 
their values become sums. Density functions of specific distributions are de- 
noted by appropriate names. Thus, if x is an observable random quantity 
with a normal distribution of mean \x and variance a 2 , its probability den- 
sity function will be denoted N(x | /J., a 2 ); if the posterior distribution of A 
is Gamma with mean a/b and variance a/b 2 , its probability density func- 
tion will be denoted Ga(A | a, b). The indicator function on a set C will be 
denoted by lc- 

Reference prior theory is based on the use of logarithmic divergence, often 
called the Kullback-Leibler divergence. 

Definition 1. The logarithmic divergence of a probability density p{9) 
of the random vector 9 6 from its true probability density p(9), denoted 
by k{P \p}, is 



provided the integral (or the sum) is finite. 

The properties of n{p \ p} have been extensively studied; pioneering works 
include Gibbs [22], Shannon [38], Good [24, 25], Kullback and Leibler [35], 
Chernoff [15], Jaynes [29, 30], Kullback [34] and Csiszar [18, 19]. 

Definition 2 (Logarithmic convergence). A sequence of probability 
density functions {pi}^i converges logarithmically to a probability density 
p if, and only if, lim^oo n(p \ pi) = 0. 

2. Improper and permissible priors. 

2.1. Justifying posteriors from improper priors. Consider a model M = 
{p(x | 9),x£ X,9 G ©} and a strictly positive prior function tt(9). (We re- 
strict attention to strictly positive functions because any believably objective 
prior would need to have strictly positive density, and this restriction elim- 
inates many technical details.) When tt(9) is improper, so that J e 7r(#) d9 
diverges, Bayes theorem no longer applies, and the use of the formal poste- 
rior density 




(2.1) 
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must be justified, even when J e p(x | 9)ir(9)d9 < oo so that ir(9 | x) is a 
proper density. 

The most convincing justifications revolve around showing that n(9 | x) 
is a suitable limit of posteriors obtained from proper priors. A variety of 
versions of such arguments exist; cf. Stone [40, 41] and Heath and Sudderth 
[28]. Here, we consider approximations based on restricting the prior to an 
increasing sequence of compact sets and using logarithmic convergence to 
define the limiting process. The main motivation is, as mentioned in the 
introduction, that objective priors are often viewed as being priors that will 
yield a good approximation to the analysis on the "true but difficult to 
specify" large bounded parameter space. 

Definition 3 (Approximating compact sequence). Consider a paramet- 
ric model A4 = {p(x | 9),x G X,9 G 0} and a strictly positive continuous 
function tt(9), 9 GO, such that, for all x G X, / e p(x | 9)ir(9)d9 < oo. An 
approximating compact sequence of parameter spaces is an increasing se- 
quence of compact subsets of 0, {0j}°^ 1; converging to 0. The correspond- 
ing sequence of posteriors with support on Oj, defined as {^(6* | x)}^, with 
7Ti(8 | x) oc p(x | 8)7Ti(6), iTi(9) = Tr(8)lQ i and q = J e _ n(9) d6, is called the 
approximating sequence of posteriors to the formal posterior tt(9 | x). 

Notice that the renormalized restrictions TTi(9) of tt(9) to the Gj are proper 
[because the 0j are compact and tt(9) is continuous]. The following theorem 
shows that the posteriors resulting from these proper priors do converge, in 
the sense of logarithmic convergence, to the posterior ir(9 | x). 

Theorem 1. Consider model M = {j>(x | <9),x G X, 9 G 6} and a strictly 
positive continuous function tt(9), such that J@p(x | 9)iT(9)d9 < oo, for all 
x G X . For any approximating compact sequence of parameter spaces, the 
corresponding approximating sequence of posteriors converges logarithmi- 
cally to the formal posterior ir(9 \ x) ocp(x | 9)tt(9). 

Proof. To prove that k{vt(- | x) | 7Tj(- | x)} converges to zero, define the 
predictive densities Pi(x) = Jq,p(x \ 9)iTi(9)d9 and p(x) = J@p(x | 9)-K(9)d9 
(which has been assumed to be finite). Using for the posteriors the expres- 
sions provided by Bayes theorem yields 





log 



Pi(x)cj 



p(x) 



log 



/ e p(x|fl)7r(fl)rfe 
J 6i p(x\9)n(9)d6- 



G 
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But the last expression converges to zero if, and only if, 



lim f p(x | 0)tt(0) d9 = f p(x\6)Tr(6)de 




and this follows from the monotone convergence theorem. □ 

It is well known that logarithmic convergence implies convergence in L\ 
which implies uniform convergence of probabilities, so Theorem 1 could, at 
first sight, be invoked to justify the formal use of virtually any improper prior 
in Bayes theorem. As illustrated below, however, logarithmic convergence of 
the approximating posteriors is not necessarily good enough. 

Example 1 (Fraser, Monette and Ng [21]). Consider the model, with 
both discrete data and parameter space, 



where [u] denotes the integer part of u, and [1/2] is separately defined as 1. 
Fraser, Monnete and Ng [21] show that the naive improper prior 7r(0) = 1 
produces a posterior ir(6 \ x) oc p(x \ 0) which is strongly inconsistent, leading 
to credible sets for given by {2a;, 2x + 1} which have posterior probability 
2/3 but frequentist coverage of only 1/3 for all 9 values. Yet, choosing the 
natural approximating sequence of compact sets Gj = {1, . . . , i}, it follows 
from Theorem 1 that the corresponding sequence of posteriors converges 
logarithmically to tt(9 \ x). 

The difficulty shown by Example 1 lies in the fact that logarithmic con- 
vergence is only pointwise convergence for given x, which does not guarantee 
that the approximating posteriors are accurate in any global sense over x. 
For that we turn to a stronger notion of convergence. 

Definition 4 (Expected logarithmic convergence of posteriors). Con- 
sider a parametric model M. = {p(x | 0),x G X,9 G G}, a strictly positive 
continuous function ir(6), 9 G G and an approximating compact sequence 
of parameter spaces. The corresponding sequence of posteriors {tTi(9 \ 
x)}-^ is said to be expected logarithmically convergent to the formal pos- 
terior 7r(0 I x) if 



where p«(x) = J .p(x | 9)-Ki{9)d9. 

This notion was first discussed (in the context of reference priors) in 
Berger and Bernardo [7], and achieves one of our original goals: A prior 



M = {p{x |0) = l/3, x G {[0/2], 20, 20 + 1}, G {1, 2, . . . 



(2.2) 
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distribution satisfying this condition will yield a posterior that, on average 
over x, is a good approximation to the proper posterior that would result 
from restriction to a large compact subset of the parameter space. 

To some Bayesians, it might seem odd to worry about averaging the log- 
arithmic discrepancy over the sample space but, as will be seen, reference 
priors are designed to be "noninformative" for a specified model, the notion 
being that repeated use of the prior with that model will be successful in 
practice. 

Example 2 (Eraser, Monette and Ng [21] continued). In Example 1, the 
discrepancies k{tt(- | x) | 7Tj(- | x)} between ir(9 | x) and the posteriors de- 
rived from the sequence of proper priors {vrj(0)}^ 1 converged to zero. How- 
ever, Berger and Bernardo [7] shows that J x k{tt(- | x) | 7Tj(- | x)}pj(x) dx — > 
log 3 as i — > oo, so that the expected logarithmic discrepancy does not go 
to zero. Thus, the sequence of proper priors {^i{9) = l/i,0 E {1, . . . ,i}}^i 
does not provide a good global approximation to the formal prior tt(9) = 1, 
providing one explanation of the paradox found by Fraser, Monette and Ng 
[21]. 

Interestingly, for the improper prior tt(9) = 1/9, the approximating com- 
pact sequence considered above can be shown to yield posterior distributions 
that expected logarithmically converge to -k(9 \ x) oc 9 -1 p(x \ 9), so that this 
is a good candidate objective prior for the problem. It is also shown in Berger 
and Bernardo [7] that this prior has posterior confidence intervals with the 
correct frequentist coverage. 

Two potential generalizations are of interest. Definition 4 requires con- 
vergence only with respect to one approximating compact sequence of pa- 
rameter spaces. It is natural to wonder what happens for other such approx- 
imating sequences. We suspect, but have been unable to prove in general, 
that convergence with respect to one sequence will guarantee convergence 
with respect to any sequence. If true, this makes expected logarithmic con- 
vergence an even more compelling property. 

Related to this is the possibility of allowing not just an approximating 
series of priors based on truncation to compact parameter spaces, but in- 
stead allowing any approximating sequence of priors. Among the difficulties 
in dealing with this is the need for a better notion of divergence that is 
symmetric in its arguments. One possibility is the symmetrized form of the 
logarithmic divergence in Bernardo and Rueda [12], but the analysis is con- 
siderably more difficult. 

2.2. Permissible priors. Based on the previous considerations, we re- 
strict consideration of possibly objective priors to those that satisfy the 
expected logarithmic convergence condition, and formally define them as 
follows. (Recall that x represents the entire data vector.) 
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Definition 5. A strictly positive continuous function ir(6) is a permis- 
sible prior for model M. = {p(x | 9), x 6 9 G G} if: 

1. for all x G X , 7r(# | x) is proper, that is, J e p(x | 0)ir(6) (16 < oo; 

2. for some approximating compact sequence, the corresponding posterior 
sequence is expected logarithmically convergent to ir(9 | x) ocp(x | 9)ir(9). 

The following theorem, whose proof is given in Appendix A, shows that, 
for one observation from a location model, the objective prior tt(9) = 1 is 
permissible under mild conditions. 

Theorem 2. Consider the model M = {f(x — 9), 9 G R, x G R}, where 
f(t) is a density function on R. //, for some e > 0, 

(2.3) lim \t\ 1+£ f(t) = 0, 

v y |t|-o' 1 ' w ' 

i/ien 7r(#) = 1 is a permissible prior for the location model M. 

Example 3 (A nonpermissible constant prior in a location model). Con- 
sider the location model M = {p{x \ 9) = f(x — 9), 9 G R, x > 9 + e}, where 
f(t) = t _1 (logt)~ 2 , t > e. It is shown in Appendix B that, if 7r(9) = 1, then 
J@ k{tt(9 I x) | ttq(9 I x)}po(^) (ix = oo for any compact set Qq = [a, b] with 
b — a > 1; thus, vr(0) = 1 is not a permissible prior for Note that this 
model does not satisfy (2.3). 

This is an interesting example because we are still dealing with a location 
density, so that tt(9) = 1 is still the invariant (Haar) prior and, as such, satis- 
fies numerous nice properties such as being exact frequentist matching (i.e., 
a Bayesian 100(1 — a)% credible set will also be a frequentist 100(1 — a)% 
confidence set; cf. equation (6.22) in Berger [2]). This is in stark contrast to 
the situation with the Fraser, Monette and Ng example. However, the basic 
fact remains that posteriors from uniform priors on large compact sets do 
not seem here to be well approximated (in terms of logarithmic divergence) 
by a uniform prior on the full parameter space. The suggestion is that this 
is a situation in which assessment of the "true" bounded parameter space is 
potentially needed. 

Of course, a prior might be permissible for a larger sample size, even if 
it is not permissible for the minimal sample size. For instance, we suspect 
that tt(9) = 1 is permissible for any location model having two or more 
independent observations. 

The condition in the definition of permissibility that the posterior must 
be proper is not vacuous, as the following example shows. 
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Example 4 (Mixture model). Let x = {x\, . . . , x n } be a random sample 
from the mixture p(x% \ 9) = \ N(x | 0, 1) + \ N(x | 0, 1), and consider the uni- 
form prior function 7r(0) = 1. Since the likelihood function is bounded below 
by 2~ n n™=i N(xj | 0, 1) > 0, the integrated likelihood JZoP( x I 9 )^i 6 ) dd = 
J^^p(x | 0) dO will diverge. Hence, the corresponding formal posterior is im- 
proper, and therefore the uniform prior is not a permissible prior function 
for this model. It can be shown that Jeffreys prior for this mixture model 
has the shape of an inverted bell, with a minimum value 1/2 at /x = 0; hence, 
it is also bounded from below and is, therefore, not a permissible prior for 
this model either. 

Example 4 is noteworthy because it is very rare for the Jeffreys prior 
to yield an improper posterior in univariate problems. It is also of interest 
because there is no natural objective prior available for the problem. (There 
are data-dependent objective priors: see Wasserman [43].) 

Theorem 2 can easily be modified to apply to models that can be trans- 
formed into a location model. 

COROLLARY 1. Consider M. = {p(x \ 0),0 E 0,x G X}. If there are mono- 
tone functions y = y(x) and (j) = 4>(9) such that p(y \ 4>) = f(y — <f>) is a lo- 
cation model and there exists e > such that lim\ t i^ \t\ 1+£ f (t) = 0, then 
tt(9) = \(p'(9)\ is a permissible prior function for M. 

The most frequent transformation is the log transformation, which con- 
verts a scale model into a location model. Indeed, this transformation yields 
the following direct analogue of Theorem 2. 

Corollary 2. Consider M = {p(x \ 9) = 9~ 1 f(\x\/9), 9>0,xe R}, 
a scale model where f(s), s > 0, is a density function. If, for some e > 0, 

(2.4) lim \t\ l+£ e t f(e t ) = 0, 

\t\^oo 

then tt(9) = 9~ l is a permissible prior function for the scale model Ai. 

Example 5 (Exponential data). If x is an observation from an expo- 
nential density, (2.4) becomes |£| 1+e e' exp(— e*) — > 0, as \t\ — > oo, which is 
true. From Corollary 2, ir(9) = 9~ 1 is a permissible prior; indeed, ~Ki(9) = 
(2i)~ 1 9~ 1 , e~ l <9<e l \s expected logarithmically convergent to ir(9). 

Example 6 (Uniform data) . Let x be one observation from the uniform 
distribution M ={Un(x | 0,0) =0 _1 , x e [0,0], > 0}. This is a scale den- 
sity, and equation (2.4) becomes |£| 1+e e'l{ < e t<i} — > 0, as \t\ — > oo, which is 
indeed true. Thus, vr(0) = _1 is a permissible prior function for M. 
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The examples showing permissibility were for a single observation. Pleas- 
antly, it is enough to establish permissibility for a single observation or, more 
generally, for the sample size necessary for posterior propriety of ir(9 | x) 
because of the following theorem, which shows that expected logarithmic 
discrepancy is monotonically nonincreasing in sample size. 

Theorem 3 (Monotone expected logarithmic discrepancy). Let M = 
{p(xi,X2 | 9) =p(xi | #)p(x 2 | xi,0),xi G Afx,x 2 G A?2,0 G 0} be a paramet- 
ric model. Consider a continuous improper prior n(9) satisfying m(xi) = 
/qP(xi I 9)n{9) d9 < oo and m(xi, x 2 ) = / Q p(xi,x 2 | 9)ir(9) d9 < 00. For any 
compact set 6 C 6, let ir (9) = n(6)l& (6)/ f Qo tt(9) dO. Then, 



(2.5) 



k{tt(- I xi,x 2 ) 1 7r (- I xi,x 2 )}mo(xi,x 2 )dxidx2 



</ «{7r(- I xi) I 7r (- I xi)}m (xx)dxi, 
where for 9 G 0o, 

/ fl 1 \ p(xi,x 2 I g)7r(g) 

7T (6I Xi,X 2 ) = -. r , 

m (xi,x 2 ) 
m (xi,x 2 )=/ p(xi,x 2 I 0)ir(6)d9, 

J0n 



TT (9 I Xi) 



'Go 

p(xi I 0)vr(0) 



mo(xi) 

m (xi)= / p(xi|0)7r(e)de. 

J0n 



'60 

Proof. The proof of this theorem is given in Appendix C. □ 

As an aside, the above result suggests that, as the sample size grows, the 
convergence of the posterior to normality given in Clarke [16] is monotone. 

3. Reference priors. 

3.1. Definition of reference priors. Key to the definition of reference pri- 
ors is Shannon expected information (Shannon [38] and Lindley [36]). 

Definition 6 (Expected information). The information to be expected 
from one observation from model M = {p(x | 0),xG X,6 G O}, when the 
prior for 9 is q(6), is 

I{q\M}= I I p(x|%(0)log^i^dxd0 
J Jxx0 q{9) 
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(3.1) 



J X 




where p{9 | x) =p(x | 9)q(9)/p(x) and p(x) = J@p(x | 0)q{6) dO. 

Note that x here refers to the entire observation vector. It can have any 
dependency structure whatsoever (e.g., it could consist of n normal random 
variables with mean zero, variance one and correlation 9.) Thus, when we re- 
fer to a model henceforth, we mean the probability model for the actual com- 
plete observation vector. Although somewhat nonstandard, this convention 
is necessary here because reference prior theory requires the introduction of 
(artificial) independent replications of the entire experiment. 

The amount of information I{q \ M} to be expected from observing x 
from A4 depends on the prior q{6): the sharper the prior the smaller the 
amount of information to be expected from the data. Consider now the 
information I{q \ Ai k } which may be expected from k independent repli- 
cations of A4. As k — > oo, the sequence of realizations {xi,...,x/ c } would 
eventually provide any missing information about the value of 9. Hence, as 
k — > oo, I{q | M k } provides a measure of the missing information about 9 
associated to the prior q{9). Intuitively, a reference prior will be a permissi- 
ble prior which maximizes the missing information about within the class 

V of priors compatible with any assumed knowledge about the value of 9. 
With a continuous parameter space, the missing information I{q | A4 fc } 

will typically diverge as k — > oo , since an infinite amount of information 
would be required to learn the value of 9. Likewise, the expected informa- 
tion is typically not defined on an unbounded set. These two difficulties 
are overcome with the following definition, that formalizes the heuristics 
described in Bernardo [10] and in Berger and Bernardo [7]. 

Definition 7 [Maximizing Missing Information (MMI) Property]. Let 
M = {p(x | 9), x € X, 9 G 6 ]R}, be a model with one continuous parame- 
ter, and let V be a class of prior functions for 9 for which J e p(x | 9)p{9) d9 < 
oo. The function ir{9) is said to have the MMI property for model A4 given 

V if, for any compact set 0q £ © and any pGV, 



where ttq and po are, respectively, the renormalized restrictions of tt(9) and 



The restriction of the definition to a compact set typically ensures the ex- 
istence of the missing information for given k. That the missing information 
will diverge for large k is handled by the device of simply insisting that the 
missing information for the reference prior be larger, as k — > oo, than the 
missing information for any other candidate p{9). 



(3.2) 



k— »oo 



lim {I{tt I M k } - I{ Po | M k }} > 0, 



P {9) to e . 
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Definition 8. A function ir(9) = n(9 \ M.,V) is a reference prior for 
model Ai given V if it is permissible and has the MMI property. 

Implicit in this definition is that the reference prior on G will also be the 
reference prior on any compact subset Go - This is an attractive property that 
is often stated as the practical way to proceed when dealing with a restricted 
parameter space, but here it is simply a consequence of the definition. 

Although we feel that a reference prior needs to be both permissible and 
have the MMI property, the MMI property is considerably more important. 
Thus, others have defined reference priors only in relation to this property, 
and Definition 7 is compatible with a number of these previous definitions 
in particular cases. Clarke and Barron [17] proved that, under appropriate 
regularity conditions, essentially those which guarantee asymptotic posterior 
normality, the prior which asymptotically maximizes the information to be 
expected by repeated sampling from M. = {p(x \ 9), x E X, 9 E Q E M} is the 
Jeffreys prior, 



which, hence, is the reference prior under those conditions. Similarly, Ghosal 
and Samanta [27] gave conditions under which the prior, which asymptoti- 
cally maximizes the information to be expected by repeated sampling from 
nonregular models of the form M. = {p(x \0),xE S(9),9 E G E M}, where the 
support S(9) is either monotonically decreasing or monotonically increasing 
in 9, is 



which is, therefore, the reference prior under those conditions. 

3.2. Properties of reference priors. Some important properties of refer- 
ence priors — generally regarded as required properties for any sensible pro- 
cedure to derive objective priors — can be immediately deduced from their 
definition. 

Theorem 4 (Independence of sample size). // data x = {yi, . . . ,y n } 
consists of a random sample of size n from model M = {p(y \ 9), y E y, 9 E 
G} with reference prior ir(9 \ M,V), then ir(9 \ M n ,V) = ir(9 \ M,V), for 
any fixed sample size n. 

Proof. This follows from the additivity of the information measure. 
Indeed, for any sample size n and number of replicates k, I{q \ Ai nk } = 
nI{q\M k }. □ 



(3.3) 




(3.4) 
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Note, however, that Theorem 4 requires x to be a random sample from the 
assumed model. If observations are dependent, as in time series or spatial 
models, the reference prior may well depend on the sample size (see, e.g., 
Berger and Yang [9] and Berger, de Oliveira and Sanso [8]). 

Theorem 5 (Compatibility with sufficient statistics). Consider the model 
Ai = {p(x | 9), x G X, 9 G 0} with sufficient statistic t = t(x) G T, and let 
Ait = {p(t | 9),t G T, 9 G 0} be the corresponding model in terms oft. Then, 
ir(9\M,T) = 7r{9\Mt,T). 

Proof. This follows because expected information is invariant under 
such transformation, so that, for all k, I{q \ Ai k } = I{q \ Ai k }. □ 

Theorem 6 (Consistency under reparametrization). Consider the model 
Aii = {p(x | 0),xG X,9 G 0}, let (f)(9) be an invertible transformation of 9, 
and let Ai\ be the model parametrized in terms of (p. Then, n((j) \ M.2>'P) is 
the prior density induced from n(9 \ M.\,V) by the appropriate probability 
transformation. 

Proof. This follows immediately from the fact that the expected infor- 
mation is also invariant under one-to-one reparametrizations, so that, for all 
k, I{q l \M k 1 }=I{q 2 \M k 2 }, where q 2 ((j)) = qi(9) x \d9/d(/)\. □ 

3.3. Existence of the expected information. The definition of a reference 
prior is clearly only useful if the I{tto \ Ai k } and I{po \ Ai k } are finite for 
the (artificial) replications of Ai. It is useful to write down conditions under 
which this will be so. 

Definition 9 (Standard prior functions). Let V s be the class of strictly 
positive and continuous prior functions on which have proper formal pos- 
terior distributions so that, when p£V s , 

(3.5) W G 0, p{9) > 0; Vx G X [ p(x | 9)p{9) d9 < oo. 

Je 

We call these the standard prior functions. 

This will be the class of priors that we typically use to define the reference 
prior. The primary motivation for requiring a standard prior to be positive 
and continuous on is that any prior not satisfying these conditions would 
not be accepted as being a reasonable candidate for an objective prior. 

Definition 10 (Standard models). Let M = {p(x | 9),x G X,9 G C M} 
be a model with continuous parameter, and let t& = t/^xi, . . . ,x/%) G 1^ be 
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any sufficient statistic for the (artificial) k replications of the experiment, 
(tfc could be just the observation vectors themselves.) The model A4 is said 
to be standard if, for any prior function p(9) G V s and any compact set Go, 



where po(9) is the proper prior obtained by restricting p(9) to Go- 

There are a variety of conditions under which satisfaction of (3.6) can be 
assured. Here is one of the simplest, useful when all p(tk \ 6), for 8 G Qq, 
have the same support. 

Lemma 1. For p{9) G V s and any compact set Qq, (3.6) is satisfied if, 
for any 9 G Qq and 9' G Go, 



Proof. The proof is given in Appendix D. □ 

When the p(tk \ 9) have different supports over 9 G Go, the following 
lemma, whose proof is given in Appendix E, can be useful to verify (3.6). 

Lemma 2. For p(9) G V s and any compact set Go, (3.6) is satisfied if: 

1. H[p(tk | 9)] = — f-j~ p(tk | 9)log[p(tk | 9)] dtk is bounded below over Qq. 

2. Po(tfc) log[po(tfc)] dtk > — oo, where po(tk) is the marginal likelihood 
from the uniform prior, that is, po(tfc) = L(Qq)~ 1 Jq q p(tfe | 9)d9, with 
L(Qq) being the Lebesgue measure of Qq. 

4. Determining the reference prior. 

4.1. An explicit expression for the reference prior. Definition 8 does not 
provide a constructive procedure to derive reference priors. The following 
theorem provides an explicit expression for the reference prior, under cer- 
tain mild conditions. Recall that x refers to the entire vector of observations 
from the model, while x( fc ) = (xi, . . . ,Xfc) refers to a vector of (artificial) in- 
dependent replicates of these vector observations from the model. Finally, let 
tfc = tfc(xi, . . . ,Xfc) G Tk be any sufficient statistic for the replicated obser- 
vations. While tfc could just be x( fc ) itself, it is computationally convenient 
to work with sufficient statistics if they are available. 



(3.6) 



Hpo I M k } < oo, 



(3.7) 
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Theorem 7 (Explicit form of the reference prior). Assume a standard 
model M. = {p(x | 9), x G X, 9 G © C R} and the standard class V s of can- 
didate priors. Let it* (9) be a continuous strictly positive function such that 
the corresponding formal posterior 

(41) 7r{6ltk) = feP(t k \e),*(e)d9 

is proper and asymptotically consistent (see Appendix F), and define, for 
any interior point 9q of Q, 

(4.2) /fc(fl) = exp|jT_p(t fc |fl)log[ir*(e|t fc )]£ft fc } and 

(4.3) m=^W- v 

If (i) each fk(9) is continuous and, for any fixed 9 and suffciently large k, 
{fk (^)//fc (^o)} is either monotonic in k or is bounded above by some h(9) 
which is integrable on any compact set, and (ii) f(9) is a permissible prior 
function, then tt(9 \ Ai,V s ) = f{9) is a reference prior for model Ai and 
prior class V s . 

Proof. The proof of this theorem is given in Appendix F. □ 

Note that the choice of tt* is essentially arbitrary and, hence, can be 
chosen for computational convenience. Also, the choice of 9q is immaterial. 
Finally, note that no compact set is mentioned in the theorem; that is, the 
defined reference prior works simultaneously for all compact subsets of 0. 

Example 7 (Location model). To allow for the dependent case, we write 
the location model for data x = (x\, . . . , x n ) as f(x\ — 0, . . . ,x n — 9), where 
we assume = R. To apply Theorem 7, choose tt*(9) = 1. Then, because 
of the translation invariance of the problem, it is straightforward to show 
that (4.2) reduces to fk(9) = Ck, not depending on 9. It is immediate from 
(4.3) that f(9) = 1, and condition (a) of Theorem 7 is also trivially satisfied. 
[Note that this is an example of choosing ir*(9) conveniently; any other 
choice would have resulted in a much more difficult analysis.] 

It follows that, if the model is a standard model and vr(^) = 1 is permissible 
for the model [certainly satisfied if (2.3) holds], then ir(9) = 1 is the reference 
prior among the class of all standard priors. Note that there is additional 
work that is needed to verify that the model is a standard model. Easiest is 
typically to verify (3.7), which is easy for most location families. 

It is interesting that no knowledge of the asymptotic distribution of the 
posterior is needed for this result. Thus, the conclusion applies equally to 
the normal distribution and to the distribution with density f(x — 9) = 
exp(x — 9), for x > 9, which is not asymptotically normal. 
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The key feature making the analysis for the location model so simple 
was that (4.2) was a constant. A similar result will hold for any suitably 
invariant statistical model if tt*(9) is chosen to be the Haar density (or 
right-Haar density in multivariable models); then, (4.2) becomes a constant 
times the right-Haar prior. For instance, scale-parameter problems can be 
handled in this way, although one can, of course, simply transform them 
to a location parameter problem and apply the above result. For a scale- 
parameter problem, the reference prior is, of course, n(9) = 9~ l . 

Example 8. A model for which nothing is known about reference priors 
is the uniform model with support on (ai(9) , d2(0)) , 

(4.4) M = (un(x | ai(0),a 2 (0)) = — 1 ^r,ai(0) < x < a 2 (6) 



a 2 (o)-a 1 (ey 

where > 8q and < a\(9) < a 2 (9) are both strictly monotonic increasing 
functions on = (9q, oo) with derivatives satisfying < a'^ff) < a! 2 (9). This 
is not a regular model, has no group invariance structure and does not belong 
to the class of nonregular models analyzed in Ghosal and Samanta [27]. The 
following theorem gives the reference prior for the model (4.4). Its proof is 
given in Appendix G. 



(4.5) 



Theorem 8. Consider the model (4-4) ■ Define 

a' 2 {9) -a[(9) 



bj(9) 



<(9) 



J = 1,2. 



Then the reference prior of 9 for the model (4-4) ^ s 
a' 2 {9)-<{0) 



n(9) 



(4.6) 



O2(0)-ai(0) 



4h 



1 



x exp< bi + - — 

I 6i-6 2 



hip 



1 



b 2 ip 



where ip(z) is the digamma function defined by ip{ z ) = 4- l°g(r(z)) for z > 0. 



Example 9 [Uniform distribution on (9, 9 2 ), 9 > 1]. This is a special case 
of Theorem 8, with 9 = 1, a x {9) = 9 and a 2 {9) = 9 2 . Then, b x = 29 - 1 and 
b 2 = (29 — l)/(29). It is easy to show that b^ 1 = b^ 1 + 1. For the digamma 
function (see Boros and Moll [14]), ip(z + 1) = ip(z) + l/z, for z > 0, so that 
^(l/6i) = ip(l/b 2 ) — b\. The reference prior (4.6) thus becomes 



n(9) 



29-1 



T) exp i 



1 



(4.7) 



29- 1 
9(9-1) 



■ exp 



61 - b 2 
b\b 2 



hip 



b 2 ip 



+ V\ 

b 2 \b 2 
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20-1 



exp M^i)} 



oc 



9(9-1) 



the last equation following from the identity &i 62/(^1 — ^2) = b 2 —b x =1. 

4.2. Numerical computation of the reference prior. Analytical derivation 
of reference priors may be technically demanding in complex models. How- 
ever, Theorem 7 may also be used to obtain an approximation to the refer- 
ence prior through numerical evaluation of equation (4.2). Moderate values 
of k (to simulate the asymptotic posterior) will often yield a good approxi- 
mation to the reference prior. The appropriate pseudo code is: 

Algorithm. 

1. Starting values: 

choose a moderate value for k; 

choose an arbitrary positive function ir* (9), say tt*(9) = 1; 
choose the number m of samples to be simulated. 

2. For any given 9 value, repeat, for j = 1, . . . , m: 

simulate a random sample {xij, . . . , x/y} of size k from p(x | 9); 
compute numerically the integral Cj = J e ni=iP( x ij I 9)ir*(9)d9; 
evaluate rj(9) = log[n£=i P( x ij I G)Tr*(9)/cj]. 

3. Compute ir(9) = exp[?n _1 Y1T=1 r i(^)l an d store the pair {9,ir(9)}. 

4. Repeat routines (2) and (3) for all 9 values for which the pair {9,it(9)} 
is required. 

If desired, a continuous approximation to tt(9) may easily be obtained from 
the computed points using standard interpolation techniques. 

We first illustrate the computation in an example for which the refer- 
ence prior is known to enable comparison of the numerical accuracy of the 
approximation . 

Example 10 [Uniform distribution on (9, 9 2 ), continued]. Consider again 
the uniform distribution on (9, 9 2 ) discussed in Example 9, where the refer- 
ence prior was analytically given in (4.7). 

Figure 1 presents the reference prior numerically calculated with the algo- 
rithm for nine 9 values, uniformly log-spaced and rescaled to have vr(2) = 1; 
m = 1000 samples of k = 500 observations were used to compute each of the 
nine {9i,n(9i)} points. These nine points are clearly almost perfectly fitted 
by the exact reference prior (4.7), shown by a continuous line; indeed, the 
nine points were accurate to within four decimal points. 

This numerical computation was done before the analytic reference prior 
was obtained for the problem, and a nearly perfect fit to the nine 9 values 
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Fig. 1. Numerical reference prior for the uniform model on [8,8 2 ]. 



was obtained by the function tt(9) = 1/(9 — 1), which was thus guessed to 
be the actual reference prior. This guess was wrong, but note that (4.7) over 
the computed range is indeed nearly proportional to 1/(0 — 1). 

We now consider an example for which the reference prior is not known 
and, indeed, appears to be extremely difficult to determine analytically. 

Example 1 1 (Triangular distribution) . The use of a symmetric triangu- 
lar distribution on (0, 1) can be traced back to the 18th century to Simpson 
[39]. Schmidt [37] noticed that this pdf is the density of the mean of two 
i.i.d. uniform random variables on the interval (0, 1). 

The nonsymmetric standard triangular distribution on (0,1), 

, , n , (2x/9, for 0<x<9, _ „ , 

p(x 0) = ^ n < , mm n\ c n ~ -i O<0<1, 

^ y 1 ; [2(1 - x)/(l - 6), for 6<x<l, 

was first studied by Ayyangar [1]. Johnson and Kotz [33] revisited non- 
symmetric triangular distributions in the context of modeling prices. The 
triangular density has a unique mode at 9 and satisfies Pr[x < 9] = 9, a 
property that can be used to obtain an estimate of 9 based on the empir- 
ical distribution function. The nonsymmetric triangular distribution does 
not possess a useful reduced sufficient statistic. Also, although log[p(x | 0)] 
is differentiable for all 9 values, the formal Fisher information function is 
strictly negative, so Jeffreys prior does not exist. 

Figure 2 presents a numerical calculation of the reference prior at thirteen 
9 values, uniformly spaced on (0,1) and rescaled to have tt(1/2) = 2/tt; 
m = 2500 samples of k = 2000 observations were used to compute each of 



DEFINITION OF REFERENCE PRIORS 



19 




0.2 0-4 0.6 0.8 1 

Fig. 2. Numerical reference prior for the triangular model. 



the thirteen {#j,7r(#j)} points. Interestingly, these points are nearly perfectly 
fitted by the (proper) prior n(6) = Be(0 | 1/2,1/2) oc (9 _1 / 2 (l-(9) _1 / 2 , shown 
by a continuous line. 

Analytical derivation of the reference prior does not seem to be feasible in 
this example, but there is an interesting heuristic argument which suggests 
that the Be(# | 1/2, 1/2) prior is indeed the reference prior for the problem. 
The argument begins by noting that, if 9 k is a consistent, asymptotically 
sufficient estimator of 9, one would expect that, for large k, 

J p(t k | 9) log[7r (fl | t fc )] dt k « j^p{9 k I 9) log[7r o (0 | 9 k )} d9 k 

^\og[TTQ{9\9 k )]\ §k=g , 

since the sampling distribution of 9 k will concentrate on 9. Thus, using (4.2) 
and (4.3), the reference prior should be 

(4.8) n(9) = MO I O k )\§ h= o<xp(9 k | 0)\ §h=g . 

For the triangular distribution, a consistent estimator of 9 can be ob- 
tained as the solution to the equation F k (t) = t, where F k (t) is the em- 
pirical distribution function corresponding to a random sample of size k. 
Furthermore, one can show that this solution, 6 k , is asymptotically normal 
N{91 | 9,s(9)/Vk), where s(9) = y/6{l-0). Plugging this into (4.8) would 
yield the Be(# | 1/2,1/2) prior as the reference prior. To make this argu- 
ment precise, of course, one would have to verify that the above heuristic 
argument holds and that 9%, is asymptotically sufficient. 
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5. Conclusions and generalizations. The formalization of the notions of 
permissibility and the MMI property — the two keys to defining a reference 
prior — are of interest in their own right, but happened to be a by-product of 
the main goal, which was to obtain the explicit representation of a reference 
prior given in Theorem 7. Because of this explicit representation and, as 
illustrated in the examples following the theorem, one can: 

• Have a single expression for calculating the reference prior, regardless of 
the asymptotic nature of the posterior distribution. 

• Avoid the need to do computations over approximating compact param- 
eter spaces. 

• Develop a fairly simple numerical technique for computing the reference 
prior in situations where analytic determination is too difficult. 

• Have, as immediate, the result that the reference prior on any compact 
subset of the parameter space is simply the overall reference prior con- 
strained to that set. 

The main limitation of the paper is the restriction to single parameter 
models. It would obviously be very useful to be able to generalize the results 
to deal with nuisance parameters. 

The results concerning permissibility essentially generalize immediately to 
the multi-parameter case. The MMI property (and hence formal definition 
of a reference prior) can also be generalized to the multi-parameter case, 
following Berger and Bernardo [7] (although note that there were heuristic 
elements to that generalization). The main motivation for this paper, how- 
ever, was the explicit representation for the reference prior that was given 
in Theorem 7, and, unfortunately, there does not appear to be an analogue 
of this explicit representation in the multi-parameter case. Indeed, we have 
found that any generalizations seem to require expressions that involve lim- 
its over approximating compact sets, precisely the feature of reference prior 
computation that we were seeking to avoid. 



By the invariance of the model, p(x) = J@ f(x — 9)it(9) d9 = 1 and ir(9 
x) = f(x — 9). To verify (ii) of Definition 5, choose 0,; = [—£,«]. Then iri(9 
x) = f(x - 6)/[2ipi(x)], 9 G 6j, where 



with F{x) = f(t) dt. The logarithmic discrepancy between 7Tj(# | x) and 
tt(9 I x) is 



APPENDIX A: PROOF OF THEOREM 2 
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2ipi(x) 

= - log[F(x + i) - F(x - i)] , 

and the expected discrepancy is 



3) 



k{tt(- I x) I 7Tj(- | dx 

1 f 00 

= : / [F(x + F(x - i)] log[F(x + i) - F(x - i)] dx 

2z J — oo 

/— 4 /*2 /*oo 

9{y,i)dy+ g(y,i)dy + g(y, i) dy = J\ + J 2 + J; 
-oo J-A J2 

where, using the transformation y = {x — 

g(y, i) = -{F[(y + 2)i] - F(yi)}\og{F[(y + 2)i) - F(yi)}. 
Notice that for fixed y £ (—4, 2), as i — > oo, 

TO, if yG (-4, -2), 
F[(y + 2)i]-F(yi)^\l, if y G (-2,0), 

[o, if ye (0,2). 

Since —vlogv < for < v < 1, the dominated convergence theorem can 
be applied to J 2 , so that J 2 converges to as i — ► 00. Next, when i is large 
enough and, for any y > 2, 

r(y+2)i 1 1/1 1 \ 
F[(y + 2)i]-F(yi)< dt = - ( - ^— — 

LV n ' J -J V i t 1+£ £\(yi) £ [{y + 2)iYJ 

_ (1 + 2/yf - 1 2 e 



ei £ (y + 2) £ ~i e y(y + 2) e 

the last inequality holding since, for < u < 1, (1 + v) £ - 1 < £2 £_1 «. Using 
the fact that — vlogv is monotone increasing in < v < e~ 1 , we have 



2 i £ y(y + 2) e i 6 y(y + 2 

which converges to as i — > 00. It may similarly be shown that J\ converges 
to as i — > 00. Consequently, {7Tj(# | x)}^ zl is expected logarithmically con- 
vergent to ir(9 I x), and thus, 7r(#) = 1 is permissible. 

APPENDIX B: DERIVATION OF RESULTS IN EXAMPLE 3 

Consider a location family, p(x \ 9) = f(x — 9), where i£l and 9 £ Q = M, 
and / is given by f(x) = x _1 (logx) _2 l( eoo )(x). Choose tt(9) = 1 and ©0 = 
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[a, b] such that L = b — a> 1. Then, 

rb 



Lp (x) 



f(x-6)d8 
1 



1 



log(— b — x) log(— a — x) 



1 - 

lo, 



1 



log(— a — x) ' 



if x < —b — e, 

if — b — e < x < —a — e, 
if x > —a — e. 



The logarithmic discrepancy between ttq(0 \ x) and ir(9 \ x) is 

K{ir(- I X) I 7To(- | X)} = t MO I X) bg X) d0 



vr (6» | a;) log 



7r(0|x 

1 



Lpo(x) 



d<9 = - log [Lp (» 



Then the expected discrepancy is 
E%k{h{- I x) | 7T (- | x)} 

Po(x)n{7r(- I x) I 7T (- I x)}g?X 



> 



Lpo(x)log[Lp (x)]dx 



oo 
-6- 



1 



1 



log(— b — x) log(— a — x) 
x log ' 
1 



1 



log(— b — x) log(— a — x) 



log 



log(t) log(t + L)J b llog(i) log(t + L) 



1 



dx 
1 



dt 



> 



1 /-oo r rt+L ^ 



" X } l0g {/ +i 



xlog (x) 



i ^Le l</t xlog 2 (x) 

Making the transformation y = t/L, the right-hand side equals 

5L(y) log{g L (y)}dy, 



dx > dt. 



where 



9h{y) 



(y+i)L i 



■ dx 



yL x(logx) 2 log(yL) log((y + 1)L) 
1 1 



log(y) + log(L) log(y) + log(l + 1/y) + log(L) 
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log(l + l/y) 



~ [log(y) + log(L)] [log(y + 1) + log(L)] ' 
Because log(l + l/y) > l/(y + 1), for y > e, 

1 



9l(v) > 



(y + l)[log(y + l) + log(L)] 2 " 
Since — plog(p) is an increasing function of p G (0,e _1 ), it follows that 
E%k{it(- I x) | 7T (- | x)} > Ji + J 2 , 

where 

T = f°° log(y + 1) , 

1 J e (y + l)[log(y + l)+log(L)]2 y ' 



= i 21og[log(y + l) + log(L)] 

(y + l)[log(y + l)+log(L)]2 y ' 

Clearly Ji = oo and J 2 is finite, so Eqk{tt(- \ x) | 7Tq(- | x)} does not exist. 
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First, 



/ k,{it(- | xi,x 2 ) | 7r (- | xi,x 2 )}mo(xi,x 2 )dxidx2 

= / / log( ^ o(g 1 Xl ' X2) } 7 r o (0)p(x 1 ,x 2 | e)d0dK 1 dx i 

Jx 1 xX 2 J0 o I 7T(0|xi,X2j J 

/ log ( ^(xi,x 2 ) l 
ATixAraJeo lvr(6»)mo(xi,x 2 ) J 



(C.l) 



+ / logi m , Xl ' X2 \ lm (xi , x 2 ) dxi cfac 2 
MxAr 2 Lm (xi,x 2 )J 

/" , f m(x 2 I xi)m(xi) | 
i 7a- 2 I m (x 2 | xi)mo(xi) J 



x m (x 2 | xi)mo(xi) dxi dx 2 
= Jo + Ji + J 2 , 
where J = / 0(J log{7ro(0)/7r(0)}7r o (0) d0, 

J\= f log( m Xl . }m (xi) dxi, 



J: 



2 



(I logi m ^ X2 ^ Xl \ |mo(x 2 | xi)dx 2 )m (x 1 )dxi. 
\\JX 2 Lm (x 2 |xi)J / 
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By assumption, Jo is finite. Note that both mo(x2 | xi) and m(x2 | xi) = 
m(xi, X2)/m(xi) = / e p(x2 | xi, 9)n(9) dO are proper densities. Because log(i) 
is concave on (0,oo), we have 



J 2 < 





Jx, 







m(x 2 | xi] 



m (x 2 | xi) dx 2 \m (x 1 ) dxi = 0. 



By the same argument leading to (C.l), one can show that 

/ k{tt(- I xi) | 7r (- I xi)}m (xi) dx.i = J + J l . 
The result is immediate. 

APPENDIX D: PROOF OF LEMMA 1 



Clearly 



I{ Po \M k } = / Po(9) I p(t k \6)\og 
J&o JT k 



ie 

po(0) I p{t k \e)\o 

0o JT k 



Po(0 | t fc ) 



Po(0) 

p(tfc I O) 



< sup / p(t k I 9) log 



Po(t& 
p(t k | 9) 



dtk d9 
dtk d9 



Po(t* 



dt k . 



Writing po(t k ) = Je P(tfc I 9')po(9') d9' , note by convexity of [—log] that 



p(t k \9)log 



P(t fc I 0) 



Po(tjt) 
p(t fc | (9) log 



< 



p(t fc [ 5) 



p(t k i 

e p(tfc | 9) ■ 
p(t fe | 0') 



e 



log 



= - / / p(t fc |0)log 
<- inf / p(t fc |0)log 



p(t* I 9) 

p(t k \9'y 



p(tk I 0) 
p(t fc 1 0') 



p(t fe 1 61) 



Po (9')d9' dt k 
Po{9')d9' dt k 
dt kP0 (9')d9' 
dtk- 



Combining this with (D.l) yields 



Hpo I M k } < sup sup / p(t k I 9) log 
eee e>£e Q JT k 



p(t fc I 0) 



p(t* I 0') 



from which the result follows. 
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Let po(6 | tfc) be the posterior of 9 under po, that is, p(t k \ 8)po(8)/po(t k ). 
Note that 



I{ P o\M k } = [ po(0) [ p(t fc |0)log 
ye JT k 



Pot 



p o (0)p(t k \e)io£ 



. Po(0) 

p(tfc I g) 

Po(tfc) 



po(#) / p(t*|e)b g [p(tfc|fl)]dt fc dfl 



e 

- / Po(tfe)log[po(tfc)]£Zt fe . 



Because I{p \ M k ] > 0, 



Po(t fe ) log[po(t fc )] eft* < / Po (6) / p(t fe | 0) log[p(t fc | 0)] tft fc d0 



6o 



Condition (i) and the continuity of p ensure the right-hand side of the last 
equation is bounded above, and condition (ii) ensures that its left-hand side 
is bounded below. Consequently, I{po \ M k } < oo. 

APPENDIX F: PROOF OF THEOREM 7 

For any p(8) £ V s , denote the posterior corresponding to po (the restric- 
tion of p to the compact set Oo) by po(9 | t k ). 

Step 1. We give an expansion of I{po \ A4 k }, defined by 



I{po\M k }= [ p (9) [ p(t fe |0)log 

J 0o JT k 



Po(0 | t fc ) 



Po(9) 



dth. dO. 



Use the equality 

P0(6> | tfc) = P0 (e | tfc) TT* (6 | tfe) 7T*(fl I tfc) 7Tgfc(g) 

po(0) n* (9\t k )7r*(9\t k ) n* k (9) p (9) ' 
fk{9) 



where 
(F.l) 



CO (A) 

We have the decomposition 



l eo (0) and co(/ fc )= / fk(8) 

J0o 



d8. 



(F.2) 



I{po\M k } = Y,G jk , 
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Glk 
G2k 



e 



Po(0) / p(t fe |0)lo. 



e 



Po(0) / p(t fc |fl)log 



Po(# I tfc) 



G 3k = I Po(0) I p(t fc |0)log 

J6o JT k 



7r*(e\t k ) 

lT*(9\t k ) 



Gi k 

It is easy to see that 



6o 



Po(0) / p(t fc |0)log 



G-$k 



e 



Po(0)log 



Po(0) 
fk(0) 



d9dt k , 
dt k d6, 
dt k d0, 



dt k d9. 



do. 



From (F.l), f k (9)/n* k (9) = c (f k ) on 8 . Then, 
(F.3) G 3k = log[c (f k )]. 

Clearly, 

Po(0) 



(FA) 



G 



4fe 



e 



de. 



Note that the continuity of p(t k \ 9) in 9 and integr ability will imply the 
continuity of f k . So, 7TQ fc is continuous and bounded, and G^ k is finite. Since, 
< I{po | M k } < oo, Gjfc, j = 1, 2, 3 are all nonnegative and finite. 
Step 2. We show that 



(F.5) 

It is easy to see 



lim G lk = 

k— >oo 



7r*(g) /e P(tfc|r)p(r)dr 
Po(0|t fc ) p(0) / eo P(tfc|r)7r*(r)dr 



(F.6) 



tt*(0) /© p(t fc |r)p(r)dr 



[^(eoltfc)]" 1 . 



p(#) /eP^fckK*^)^ 
The definition of posterior consistency of ir* is that, for any 9 6 and any 

e>0, 



P*(|r -6\<e\t k )= f 7r*(r | t fc ) dr 1, 

J|t:|t-6»|<£> 



(F.7) 

/{t:|t-6»|< £ } 

in probability p(t& | 0) as A; —> oo. It is immediate that 



(F.8) 



PVR It \ _ S& P(^k I r)7r*(r)dr P 
-r (W t fc J - « ► 1, 
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with probability p(t& | 9) as 9 € Oo and fc — > oo. Because both -zr* and p are 
continuous, for any e > 0, there is small 5 > 0, such that 

p(r) p{9) 



(F.9) 



7T*(r) 7T*(0) 

For such a 5, we could write 

7r*(0)/ 0o p(t fc |r)p(T)dr 



< e Vr € 6 n {9 - 5, 9 + S) 



(F.10) 
where 



Jlk + ^2fe, 



J 2 k 



P{0) JeP(tk\T)n*(T)dT 

tt*(9) Ie n{e-s,e+S)P( t k I t)(p(t)/tt*(t))7t*{t) dr 
P(0) J e p(t k \r)7r*(r)dT 

tt*(9) Je a n(e~8,e+6)c I T )(p( r )/ 7r *( T )) 7r *( r ) 



p(0) 

Clearly, (F.9) implies that 



/ e p(t* I r)7r*(r)dr 



Jik< 



p(0) 

p(0) 



Kg) 

7T*(0) 

Kg) 

7T*(0) 



e n(e-<5,6»+<5) 



7r*(r | t fc )dr, 



e o n(0-<5,0+<5) 



7r*(r | t fc )dT. 



(F.7) implies that, for the fixed 5 and 9 £ Qq, 



(F.ll) 



1-e 



TT^g) 

p(0) 



<«/ifc< 



l + e 



7T*(0) 



p(0) 



(F.12) 



< J 2k < 



7r*(r|t fc )dr-U0, 



with probability p(tfc | 0) as fc — > oo. Noting that p{9) is continuous and 
positive on Go, let Mi > and M 2 be the lower and upper bounds of p on 
e . From (F.7), 

M 2 tt*(0) 

M\p(9) Je n(9~s,9+s) c 

with probability p{t k | 9) as fe — > oo. Combining (F.6), (F.8) and (F.10)- 
(F.12), we know that 

(F.13) ^TT- 1 

P0{9 | tfe) 

with probability p(tfc | 0) as — > oo. It is easy to see that the left quantity of 
(F.13) is bounded above and below, so the dominated convergence theorem 
implies (F.5). 

Step 3. We show that 

t fc ) 



(F.14) G 5k = f ir* (9) f p(t fc |0)log 



7T*(0|tfc) 



dt k d9^0 



as k — > oo. 
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For any measurable set icK, denote P*(A \ t k ) = f A 7r*{9 \ t k )d6. Then, 

7T* (9 | tfc) = P (t k | g)7Tg(g)/ P 5(t fc ) = J eP (t k I g)7T*(g)dg 

n*(9\t k ) p(t k \9)ir*(9)/p*(t k ) J eo p(t k \ 9)tt*(9) d9 
_ 1 | / eg P(t fc |gK(g)dg _ i | P*(®*\t k ) 



Thus, 



/eoPtt*l«)T*(»)"» J»*(So|t t ) 



For any < a < < oo, denote 
(F.15) T fe)a , 6 -|t fc .a< pjK(@o|tfc) < 
Clearly, if < e < M < oo, 

%e = ^fc,0,e U ?~ k ,e,M U %c,M,oo- 

We then have the decomposition for G^, 

(F-16) G 5 fc = G 5 fci + G 5 fc 2 + ^5^3, 

where 

G 5k3 = I 7T* (6)[ p(t fc |g)log(l+ gMM }dt»dg. 
The posterior consistency (F.8) implies that if 9 G 0q (the interior of 0o)> 

(F.i7) P *(es|t fe )-^o, 

in probability p(t& | 0) as k — ► oo. So (F.17) implies that, for any small e > 
and any fixed 9 G ©[], 

(F.18) / p(t fe |0)dt fc — >0 as/c^oo. 

For small e > 0, 

GW<log(l+e) / TToW / p(tfc|0)dt fe d0 

<log(l + e) <e. 
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For any large M > max(e, e — 1), 

G 5fe2 <log(l + M) f 7r* (9) [ P (t k \6)dt k d6 

< log(l + M) f it* (6) f p(t fc | 9) dt k 69. 

Since 7Tq is bounded on 0o, (F.18) and dominated convergence theorem 
imply that 

Gsk2 — ► as k — > oo. 

Also, 

G 5fc3 = / K(6) I p(t k | fl)log{ 1 )dt k dO 

= 7~T\ I P*(*fc) / P*(^|t fe )log[P*(e o |t fc )]d0dt fe 

= TTT / P*(t fe )P*(6 | t fc )log[P*(6 | t k )]dt k . 

Note that t fc G 7fc,Af,oo if and only if P*(9 | t fe ) < 1/(1 Also, -plog(p) 
is increasing for p G (0,1 /e) . This implies that 

-P*(G | t fc )log[P*(9 | t fc )] < j^Ml + M). 
Consequently, 

^,(,-)(l + M) ' 0g(1 + M) ' 

Now for fixed small e > 0, we could choose M > max(e,e — 1) large enough 
so that G$ k 3 < e. For such fixed e and M, we know G^ k 2 ~ > as /c — ► oo. 
Since e is arbitrary, (F.14) holds. 
Step 4. We show that 

(F.19) limG 2fc = VpeVs. 

k—>oo 

Note that for any p£V s , there is a constant M > 0, such that 

su P ^<M. 

t€0o ^Ol 1 ") 
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Since 7r5(0|t fc )/7r*(0|t fc )>l, 

^ d ^dt k d9 = MG 5k . 



0<G 2k <M f tt* (6) f p(t fc |0)log 



7r*(e\t k ) 



Then, (F.14) implies (F.19) immediately. 

Step 5. It follows from (F.2) that for any prior p EV S , 

I{tt I M k } - I{ Po | M k ] 

'<{0\t k ) 

/e 
MO) 



7T ((9)log 



Steps 2 and 4 imply that 



-G lk -G 2k + [ MO) [ p(t fc |0)log 

de+ f Po (9)io g 



7r*(e\t k ) 

Po(0) 



d0. 



hm (/{vro | M fc } - /{p I M k }) = lim (- / 7r o (0)lo g r 
(F.20) + / p (#)log 

> — lim / ttq(0) log 



Po(0) 



MQ) 



dO 

d.9 



the last inequality holding since the second term is always nonnegative. 
Finally, 



lim / M0)log[M(6)]d6 

k->OOjQ 



hm / vro (9) log f 

lim / 7r o (0)log 
fc-»ooje 

lim / MO) lo S 



r/0 



co(/fc). 

/fc(0) /fc(go) 
/fe(0 O ) co(/ fc ) 

/*(*) 



r/0 



(i0 + lim log 



e 



fk(Qo)l fc ->°° 



7r o (0)log[/(0)]d0-log[co(/)] 



fk(0o) 
co(fk) 



ir (9)log[M0)]d9, 

e 

the second to last line following from condition (i) and 

lim #1= lim f 
k ^°° fk{0o) k~*ooJe fk(9o) 
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= / lim jj^\de= f f(0)dB = co(f). 
Consequently, the right-hand side of (F.20) is 0, completing the proof. 

APPENDIX G: PROOF OF THEOREM 8 

Let x( fc ) = {xi, . . . ,Xk} consist of k replications from the original uniform 
distribution on the interval (ai(9), 02(6')). Let ti = t^i = min{a;i, . . . , Xk} and 
£2 = tk2 = maxjxi, . . . ,Xk}- Clearly, t& = (ti,t 2 ) are sufficient statistics with 
density 

(G.l) P(h,t 2 \9) = k ^~l ){t2 ~^ k \ a 1 (9)<t 1 <t 2 <a 2 (9). 

Choosing tt*(9) = 1, the corresponding posterior density of 9 is 

1 



n*(9\h,t 2 ) 



(G.2) 



where 



[a 2 (9)-a 1 (9)] k rn k (h,t 2 y 

a^ 1 (t 2 )<9<a^ 1 {t 1 ), 



fa^iti) 1 

(G.3) rn k (t u t 2 )= -t—t- T^ds. 

Ja-\t 2 ) M s ) -ai (s)]* 

Consider the transformation 

(G.4) y 1 = k(aJ 1 (t 1 )-9) and y 2 = k(9 - \t 2 )), 

or equivalently, t\ = a\{9 + y\/k) and t 2 = a 2 {9 — y 2 /k). 

We first consider the frequentist asymptotic distribution of (yi,y 2 ). For 
9 > 9 , we know a\{9) < a 2 (9). For any fixed y± > and y 2 > 0, a\{9 + 
yi/k) < a 2 (9 — y 2 /k) when k is large enough. From (G.l), the joint density 
of (y 1,2/2) is 

v{yi,V2 1 6) 

(k - 1) a[ (9 + y x /k)a' 2 (9 - y 2 /k) / ^ / _ y, \ 2 _ ^ / + Vl ^ fc " 2 



k [a 2 (9) - ai {9)] k I z \ k) V k 

(k-l)a' 1 (9 + y 1 /k)a' 2 (9-y 2 /k) 



2 



k [a 2 (9) - ai (9)] 

[a 2 (9) - ai (9)]k ^ \k, 
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For fixed 9 > 9o, yi,y 2 > 0, as k — ► 00, 



^(2/1,2/2 1 0) 



(G.5) 



[a 2 (9) - ai (9)f 

= p*(yi,V2 1 0). 



cxp 



o / 1 (%i + « 2 (% 2 ] 
a 2 (0)-ai(0) J 



Consequently, as /c — > 00, the j/j's have independent exponential distributions 
with means Aj = [0-2(6) — a\(9)]/a' i (9). 
With the transformation (G.4), 

rO+yi/k 



m k (t\M) 



(G.6) 



1 



V2/k [a 2 (s) - a 1 (s)] k 
1 /-w 1 



(is 



* y_ w [02(0 + u/fc) - 01 (0 + ^A)] fc 
So, for any fixed 2/1, y 2 > as A; — ► 00, 

fc[a2(0)-ai(0)]*m fc (ti,t 2 ) 



.VI 



cxp 



,V2 



03(e) -ai(fl)' 



r/<> 



a 2 (fl)- Q i(fl) 
a' 2 (9)-a[(9) 



cxp 



4(0)-^) , 

a 2 (0)-Oi(0) 



2/2 



x 1 1 — exp 

Then, for fixed 9 > 9q as k — > 00, 



(yi + 2/2) 



(G.7) 



where 



02(e)- 01 (0) 
f log(vr(e I t 1 ,t 2 ))f(t 1 ,t 2 I (9) d*i dt 2 - log(k) 



f a' 2 (9) - a[(9) 
\a2(9)- ai (6) 



Ji(0) 



Ms) 



— >log 

02(g)- a'l(fl) 

a 2 (0) - ai(0) Jo Jo 



+ Ji(e) + J 2 (0), 



00 /-oo 



log< 1 — exp 



2/2P*(2/l,2/2 I 0)dy 1 dy 2 

a' 2 (e)- a [(e) 



a 2 (9)-a 1 (e) 



10 Jo 
x P* (2/1,2/2 I 9)dyidy 2 . 
It follows from (G.5) that 

Ji(e) = -b 2 , 

J 2 {9) = -E\og{\ - e - blVl e- b2V2 }, 



(yi + 2/2) 
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where V\ and V 2 are i.i.d. with the standard exponential distribution. Then, 

1 



J 2 (0) = -E(e- jblVl )E(e- jb2V2 ) 



(G. 



00 

E 



^'(6ij + i)(&2i + i) 

_1 1_ J_ 

61 " h ^ j V j + j + 1/&2 



Note that the digamma function i^{z) satisfies the equation, 

1_ _ ^(z + l) + 7 



(G.9) 



for z > 0, where 7 is the Euler-Mascherono constant (see, e.g., Boros and 
Moll [14].) Equations (G.8) and (G.9) imply that 



MO) 



1 



h - b 2 



7 + ^{ 6l K^ + 1 )-^(^ + 1 



Using the fact that + 1) = ip{z) + l/z, 



(G.10) 



Ji(e) + j 2 (e) = 7-b 2 + ^^(bl + b 1 ^Q- 



bl 



b 2 ip 



b 2 



" +h+ b^vS h Kh)- h Kh 



The result follows from (G.7) and (G.10). 
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